Skip to content

fix(web): drop stale queued-message ghosts on at-bottom refresh#811

Merged
tiann merged 2 commits into
tiann:mainfrom
flyinghanger:fix/queued-message-ghosts
Jun 6, 2026
Merged

fix(web): drop stale queued-message ghosts on at-bottom refresh#811
tiann merged 2 commits into
tiann:mainfrom
flyinghanger:fix/queued-message-ghosts

Conversation

@flyinghanger

Copy link
Copy Markdown
Contributor

Problem

Two old queued-message cards stayed pinned above the composer forever and could never be dismissed. Investigation showed the hub DB had zero rows with `invoked_at IS NULL` — the cards were pure client-side residue, not real queued work.

Root cause

A queued user message (`invokedAt === null`) is only ever cleared by the live `messages-consumed` SSE (which flips `invokedAt` via `markMessagesConsumed`). That event is one-shot. If the client is offline/closed when the CLI consumes the message, the signal is lost permanently. On reload:

  1. The row is restored from `sessionStorage` still flagged queued (`invokedAt: null`).
  2. The server's now-invoked copy is too old to appear in the latest fetch window, so `mergeMessages` never corrects it.
  3. `trimPreservingQueued` deliberately never evicts queued rows — so it's pinned forever as a phantom card.

Fix

Reconcile the rebuilt window against the authoritative latest page in `fetchLatestMessages` (at-bottom path). An immediate, server-echoed, locally-queued message whose id is absent from the server response is a ghost and is dropped.

False-positive guards (kept even when absent from the response):

  • optimistic rows (`id === localId`) — the server echo may still be in flight; `mergeMessages` owns their reconciliation.
  • scheduled rows (`scheduledAt != null`) — the hub omits not-yet-mature scheduled messages from `getMessages`, so absence is expected.

A genuinely-still-queued immediate message sorts by `createdAt` to the top of the latest page, so it is always present in the fetched window and never dropped.

Tests

  • New integration test reproducing the ghost + its removal on at-bottom refresh.
  • New pure-function test for `reconcileQueuedAgainstLatest` covering all three keep-guards + the drop case.
  • Full web suite: 696 passed; `tsc --noEmit` clean.

A queued user message (invokedAt === null) is only cleared by the live
messages-consumed SSE. If the client is offline/closed when the CLI
consumes it, that one-shot event is lost; on reload the row is restored
from sessionStorage still flagged queued, the server's invoked copy is
too old to appear in the latest window so mergeMessages never corrects
it, and trimPreservingQueued pins it forever as a phantom card above the
composer.

Reconcile the rebuilt window against the authoritative latest page in
fetchLatestMessages: an immediate, server-echoed, locally-queued message
absent from the server response is a ghost and is dropped. Optimistic
rows (echo may be in flight) and scheduled rows (hub omits not-yet-mature
messages) are kept.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] Stale latest-page responses can drop real queued rows. The reconciliation runs against prev.messages at apply time but uses response.messages from the older HTTP snapshot, so a message-received SSE that arrives while fetchLatestMessages is in flight can add a server-echoed queued row (id !== localId) that is absent from that snapshot and gets filtered as a ghost.

Summary

  • Review mode: initial
  • One correctness issue found in the queued-message reconciliation path. Residual risk: this race is timing-dependent, so it needs a generation/snapshot test around concurrent fetchLatestMessages + SSE ingestion.

Testing

  • Not run (automation)

HAPI Bot

Comment thread web/src/lib/message-window-store.ts Outdated
// trimVisible preserves every queued row, so a ghost (queued locally
// but already invoked server-side, missed messages-consumed while
// offline) would otherwise be pinned forever.
const reconciled = reconcileQueuedAgainstLatest(merged, response.messages)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major] reconcileQueuedAgainstLatest compares the current window to the response from the request's start. fetchLatestMessages captures initial before awaiting api.getMessages, but the updater later reads the latest prev.messages; if a message-received SSE lands in between, ingestIncomingMessages can replace the optimistic row with a server-echoed queued row (id !== localId). That row is absent from the older response.messages, so this line filters it out as a ghost and the queued bar loses real work.

Suggested fix:

const queuedIdsAtRequestStart = new Set(
    [...initial.messages, ...initial.pending]
        .filter((msg) => isQueuedForInvocation(msg) && msg.scheduledAt == null && !isOptimisticMessage(msg))
        .map((msg) => msg.id)
)

const reconciled = reconcileQueuedAgainstLatest(merged, response.messages, queuedIdsAtRequestStart)

and keep newly-arrived queued rows in the helper:

if (!queuedIdsAtRequestStart.has(msg.id)) return true

Review (PR tiann#811): reconcileQueuedAgainstLatest ran against the latest
page snapshot captured at the fetch request's start, but the reconcile
itself executes in the updater against the live window. A message-received
SSE landing while fetchLatestMessages was in flight could insert a real
server-echoed queued row (id != localId) absent from that older snapshot,
which was then filtered out as a ghost — dropping genuine queued work.

Capture the immediate, server-echoed queued ids at request start and only
reconcile those; rows that arrived after the fetch was issued are exempt.
@flyinghanger

Copy link
Copy Markdown
Contributor Author

Thanks — confirmed and fixed in 8ca2c11.

The race was real: reconcileQueuedAgainstLatest executed inside the updater against the live window, but response.messages is the snapshot from the request's start. A message-received SSE landing mid-fetch could insert a real server-echoed queued row (id !== localId) that the snapshot never saw, and it got filtered as a ghost.

Fix follows the suggested approach: capture the immediate, server-echoed queued ids at request start (queuedReconcileCandidateIds(initial.messages, initial.pending)) and gate the helper on them — only rows already queued when the fetch was issued are reconcile candidates; anything that arrives mid-fetch is exempt.

Added a regression test (keeps a queued row that arrives via SSE while the latest fetch is in flight) that resolves the HTTP fetch via a deferred and ingests a fresh queued row in the gap, asserting the ghost is dropped while the fresh row survives. Full web suite green (665).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings
No findings.

Summary
Review mode: follow-up after new commits
No issues found in the latest diff. Residual risk: web store behavior remains timing-sensitive around concurrent HTTP refresh and SSE updates, but the added regression coverage targets the prior bot finding.

Testing
Not run (automation: bun is not installed in this runner PATH)

HAPI Bot

@tiann tiann merged commit f4c3513 into tiann:main Jun 6, 2026
2 checks passed
yoyoworms added a commit to yoyoworms/hapi that referenced this pull request Jun 10, 2026
…or ACP

Merge tiann/hapi upstream (50 commits, v0.20.0 + v0.20.1) into the
liuxin fork. Notable adoptions:

- POST /sessions/:id/reopen + web Reopen button (tiann#826), archive
  metadata preservation (tiann#825), clearSessionArchiveMetadata /
  restoreSessionArchiveMetadata on sessionCache
- runner self-restart resilience under external supervision (tiann#814)
- session export (tiann#808), scratchlist v1.1 (tiann#798), voice backend
  enabled (Gemini Live / Qwen Realtime, tiann#692/tiann#742)
- cursor legacy→ACP auto-migration machinery (tiann#799/tiann#824/tiann#835/tiann#844)
- stale queued-message ghost fix (tiann#811), composer send-error
  restore path, mermaid/remark-math web fixes

Conflict resolutions (32 files) — fork features preserved:
- shared/models.ts: adopt upstream `fable`/`fable[1m]` aliases
  (verified working via `claude --model fable -p`), drop our
  `claude-fable-5` long-form ids, keep opus-4-6/4-7 1M pins
- syncEngine.resumeSession: keep fork Restart behavior (archive
  active session then respawn — web Restart depends on it) over
  upstream's return-success-if-active; keep resumeWithSessionId
  override, --continue fallback, auto-discovery scan, and
  retry-without-token; adopt upstream's cursor auto-migrate hook,
  never-started fresh-spawn path, and the no-set-session-config
  fix; adapt upstream's 11-arg spawnSession call to our 13-arg
  signature (sandbox + continueLatest)
- sessions routes: keep fork /resume-options + /pin alongside
  upstream /reopen + /migrate-to-acp
- sessionBase.onSessionFound: merged signature
  (sessionId, extras?, sessionFilePath?) serving upstream's cursor
  metadata extras and our transcript-path scanner callback
- useSendMessage.onError: upstream's attachment-aware composer
  restore + fork's clearTurnLock/pauseQueue queue stop
- SessionChat: keep fork InactiveSessionBanner (one-click resume +
  browse) over upstream's plain-text banner; adopt voice
  integration; keep recent-12h section + pin + compact postTokens
- web/server.ts: keep 100MB body cap + options-object push routes;
  adopt voice WS proxies and codexDesktop routes
- telegram bot: adopt upstream formatReadyNotification

Validation: typecheck green (cli/web/hub); 983/984 tests pass.
The one failure is runner.integration stress test (spawn/stop 20
concurrent sessions) — environmental on this loaded dev machine;
not in the deploy script's focused test set.

via [HAPI](https://hapi.run)

Co-Authored-By: HAPI <noreply@hapi.run>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants